Search CORE

323 research outputs found

Melodic track identification in MIDI files considering the imbalanced context

Author: D. Rizo
H.C. Shen
N.V. Chawla
S. Kotsiantis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

In this paper, the problem of identifying the melodic track of a MIDI file in imbalanced scenarios is addressed. A polyphonic MIDI file is a digital score that consists of a set of tracks where usually only one of them contains the melody and the remaining tracks hold the accompaniment. This leads to a two-class imbalance problem that, unlike in previous work, is managed by over-sampling the melody class (the minority one) or by under-sampling the accompaniment class (the majority one) until both classes are the same size. Experimental results over three different music genres prove that learning from balanced training sets clearly provides better results than the standard classification proces

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Index of balanced accuracy: a performance measure for skewed class distributions

Author: J. Huang
M. Sokolova
N. Japkowicz
N.V. Chawla
P.W. Bradley
S. Daskalaki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper introduces a new metric, named Index of Balanced Accuracy, for evaluating learning processes in two-class imbalanced domains. The method combines an unbiased index of its overall accuracy and a measure about how dominant is the class with the highest individual accuracy rate. Some theoretical examples are conducted to illustrate the benefits of the new metric over other well-known performance measures. Finally, a number of experiments demonstrate the consistency and validity of the evaluation method here propose

CiteSeerX

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

On the suitability of combining feature selection and resampling to manage data complexity

Author: I.T. Jolliffe
M. Basu
N.V. Chawla
S. Kotsiantis
V. García
Z. Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The effectiveness of a learning task depends on data com- plexity (class overlap, class imbalance, irrelevant features, etc.). When more than one complexity factor appears, two or more preprocessing techniques should be applied. Nevertheless, no much effort has been de- voted to investigate the importance of the order in which they can be used. This paper focuses on the joint use of feature reduction and bal- ancing techniques, and studies which could be the application order that leads to the best classification results. This analysis was made on a spe- cific problem whose aim was to identify the melodic track given a MIDI file. Several experiments were performed from different imbalanced 38- dimensional training sets with many more accompaniment tracks than melodic tracks, and where features were aggregated without any correla- tion study. Results showed that the most effective combination was the ordered use of resampling and feature reduction techniques

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

A hybrid algorithm to improve the accuracy of support vector machines on skewed data-sets

Author: A. Fernández
B.X. Wang
G. Wu
G.E. Batista
H. Han
N.V. Chawla
R. Akbani
S. García
Z.-Q. Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Over the past few years, has been shown that generalization power of Support Vector Machines (SVM) falls dramatically on imbalanced data-sets. In this paper, we propose a new method to improve accuracy of SVM on imbalanced data-sets. To get this outcome, firstly, we used undersampling and SVM to obtain the initial SVs and a sketch of the hyperplane. These support vectors help to generate new artificial instances, which will take part as the initial population of a genetic algorithm. The genetic algorithm improves the population in artificial instances from one generation to another and eliminates instances that produce noise in the hyperplane. Finally, the generated and evolved data were included in the original data-set for minimizing the imbalance and improving the generalization ability of the SVM on skewed data-sets

Crossref

Red Mexicana de Repositorios Institucionales

Repositorio Institucional de la Universidad Autónoma del Estado de México

Empirical comparison of correlation measures and pruning levels in complex networks representing the global climate system

Author: Chawla N.V.
de Alwis Pitts D. A.
Ganguly A.R.
Pelan A.
Steinhaeuser K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Climate change is an issue of growing economic, social, and political concern. Continued rise in the average temperatures of the Earth could lead to drastic climate change or an increased frequency of extreme events, which would negatively affect agriculture, population, and global health. One way of studying the dynamics of the Earth's changing climate is by attempting to identify regions that exhibit similar climatic behavior in terms of long-term variability. Climate networks have emerged as a strong analytics framework for both descriptive analysis and predictive modeling of the emergent phenomena. Previously, the networks were constructed using only one measure of similarity, namely the (linear) Pearson cross correlation, and were then clustered using a community detection algorithm. However, nonlinear dependencies are known to exist in climate, which begs the question whether more complex correlation measures are able to capture any such relationships. In this paper, we present a systematic study of different univariate measures of similarity and compare how each affects both the network structure as well as the predictive power of the clusters. Â© 2011 IEEE

University of Lincoln Institutional Repository

Crossref

Use of ensemble based on GA for imbalance problem

Author: G.E. Batista
K. Woods
N.V. Chawla
R. Barandela
R. Barandela
R. Jacobs
R.C. Prati
R.C. Prati
S. Daskalaki
S. Tan
T. Fawcett
T.G. Dietterich
V. Dasarathy
Y. Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

In real-world applications, it has been observed that class imbalance (significant differences in class prior probabilities) may produce an important deterioration of the classifier performance, in particular with patterns belonging to the less represented classes. One method to tackle this problem consists to resample the original training set, either by over-sampling the minority class and/or under-sampling the majority class. In this paper, we propose two ensemble models (using a modular neural network and the nearest neighbor rule) trained on datasets under-sampled with genetic algorithms. Experiments with real datasets demonstrate the effectiveness of the methodology here propose

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Improving Risk Predictions by Preprocessing Imbalanced Credit Data

Author: B. Tian
C. Bunkhumpornpat
C. Phua
D.L. Wilson
G.E.A.P.A. Batista
I. Brown
J. Demšar
J. Laurikkala
K. Kennedy
L.C. Thomas
N. Japkowicz
N.M. Kiefer
N.V. Chawla
P.E. Hart
S.J. Yen
V. Vinciotti
Y.M. Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification

Author: A. Martínez-Usó
D.A. Landgrebe
D.P. Williams
F. Melgani
H. He
I.T. Jolliffe
J.A. Richards
J.C. Platt
J.R. Quinlan
J.R. Quinlan
L. Breiman
L. Bruzzone
L.O. Jiménez
M. Hall
M. Kubat
M. Trebar
M. Wasikowski
N. Japkowicz
N.V. Chawla
P.H. Hsu
R. Blagus
S. García
T. Fawcett
V. Kecman
V.N. Vapnik
X. Chen
Z.H. Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA and a supervised filter are applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of combining several techniques to tackle the imbalance and the high dimensionality problems, and also to evaluate the order of application that leads to the best classification performance. Experimental results demonstrate the significance of using together these two preprocessing tools to improve the performance of hyperspectral imagery classification. Although it seems that the most effective order corresponds to first a resampling strategy and then a feature (or extraction) selection algorithm, this is a question that still needs a much more thorough investigation in the futureThis work has partially been supported by the Spanish Ministry of Education and Science under grants CSD2007–00018, AYA2008–05965–0596 and TIN2009–14205, the Fundació Caixa Castelló–Bancaixa under grant P1–1B2009–04, and the Generalitat Valenciana under grant PROMETEO/2010/02

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I